Quantitative Biology
○ Wiley
Preprints posted in the last 90 days, ranked by how well they match Quantitative Biology's content profile, based on 11 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.
Cui, T.; Wang, Z.; Wang, T.
Show abstract
AI-based molecular dynamics simulation brings ab initio calculations to biomolecules in an efficient way, in which the machine learning force field (MLFF) locates at the central position by accurately predicting the molecular energies and forces. Most existing MLFFs assume localized interatomic interactions, limiting their ability to accurately model non-local interactions, which are crucial in biomolecular dynamics. In this study, we introduce ViSNet-PIMA, which efficiently learns non-local interactions by physics-informed multipole aggregator (PIMA) and accurately encodes molecular geometric information. ViSNet-PIMA outperforms all state-of-the-art MLFFs for energy and force predictions of different kinds of biomolecules and various conformations on MD22 and AIMD-Chig datasets, while adapting the PIMA blocks into other MLFFs further achieves 55.1% performance gains, demonstrating the superiority of ViSNet-PIMA and the universality of the model design. Furthermore, we propose AI2BMD-PIMA to incorporate ViSNet-PIMA into AI2BMD simulation program by introducing "Transfer Learning-Pretraining-Finetuning" scheme and replacing molecular mechanics-based non-local calculations among protein fragments with ViSNet-PIMA, which reduces AI2BMDs energy and force calculation errors by more than 50% for different protein conformations and protein folding and unfolding processes. ViSNet-PIMA advances ab initio calculation for the entire biomolecules, amplifying the application values of AI-based molecular dynamics simulations and property calculations in biochemical research.
Yan, J.; Wu, Q.; Li, Y.; Cai, J.; Zhou, M.; CACPbell-Valois, F.-X.; Siu, S. W.
Show abstract
Cancer remains a major global health threat, with its incidence and mortality rates consistently rising in recent years. Anticancer peptides (ACPs) are short amino acid chains that can inhibit the growth or spread of cancer cells. Compared to traditional treatments, ACPs are a promising class of potential cancer therapies due to their multiple mechanisms, potential for combination cancer therapy, enhanced immune function, lower toxicity to normal tissues, fewer side effects, and less drug resistance. Although it is necessary to explore novel ACPs, traditional wet-lab methods for selecting them are labor-intensive, time-consuming, and expensive. To accelerate the discovery of novel ACPs, we proposed Diffusion-ACP39, a latent diffusion-based generative model with synchronized seed autoencoder for anticancer peptide design, capable of generating novel peptides with lengths ranging from 5 to 39 amino acids. Furthermore, we developed RF-ACP39, a random forest classifier model to assess the generative power of Diffusion-ACP39. Finally, Diffusion-ACP39 achieved an accuracy of 94.5% when generating 10,000 peptides with RF-ACP39. We also qualitatively analyzed the differences among true ACPs, random sequences, random peptides, and generated ACPs, demonstrating that the generated ACPs are most similar to true ACPs.
Wu, Y.
Show abstract
Intercellular communication is governed by the spatiotemporal dynamics of protein complexes at the cell-cell interface. However, conventional static interaction models fail to incorporate key physical constraints, such as steric hindrance, spatial compartmentalization, and dimensionality reduction that regulate complex assembly in vivo. To bridge the gap between static network topology and dynamic systems biology, we developed a multi-scale computational framework. We first identified a highly conserved, Fibroblast Growth Factor Receptor 1 (FGFR1)-centered cell adhesion and signaling motif by analyzing a diverse set of human cell-cell interfaces. We then constructed a multi-layer spatial stochastic simulator to recapitulate and interrogate the dynamic behavior of this network motif at cell-cell interfaces. Atomic-resolution structural models of the protein complexes within the motif were further generated using AlphaFold to define interaction rules for the stochastic simulations by categorizing binding interfaces. Our results show that the structural arrangement of cell-cell adhesion complexes controls how FGFR1 receptors cluster at the cell-cell interface, effectively dividing the membrane into distinct functional microdomains. Competition from decoy receptors further regulates this process by capturing receptors before they can participate in signaling. Even small changes in binding affinity can therefore alter receptor organization and disrupt normal signal transduction, which may contribute to human disease. By integrating macro-scale interactomics, atomic-level structural bioinformatics, and mesoscale stochastic modeling, this study reveals how structural interaction rules, combined with spatial constraints, shape the formation and function of intercellular signaling networks.
Waema, R.; Adongo, C.; Lago, S.; Ogutu, K.
Show abstract
Human immunodeficiency virus (HIV) persistence remains a major barrier to cure due to the existence of long-lived latent reservoirs that evade immune clearance and persist despite combination antiretroviral therapy (ART). Although ART effectively suppresses viral replication, treatment interruption often leads to rapid viral rebound originating from these latent reservoirs. In this study, we develop a deterministic mathematical model describing the in vivo dynamics of HIV infection incorporating uninfected CD4+ T cells, infected cells, latent reservoirs, deep latent reservoirs, and infectious and non-infectious virions, while explicitly accounting for the therapeutic effects of reverse transcriptase inhibitors (RTIs), protease inhibitors (PIs), and Tat transcription inhibitors. Analytical results establish positivity and boundedness of solutions and derive the effective reproduction number Re using the next-generation matrix approach. Stability analysis shows that the virus-free equilibrium is locally asymptotically stable when Re < 1, while viral persistence occurs when Re > 1. Numerical simulations were performed to investigate therapy interactions, viral rebound following treatment interruption, and the impact of drug efficacy on viral set-points and latent reservoir dynamics. To further explore therapy interactions, three-dimensional viral set-point surfaces and heat maps were generated to examine how combinations of infection inhibition, viral production inhibition, and transcriptional inhibition influence viral dynamics. The simulations reveal that Tat inhibition suppresses viral transcription, thereby reducing the transition of infected cells into productive infection and limiting viral propagation when combined with conventional ART mechanisms. The therapy parameter planes further demonstrate that strong transcriptional inhibition promotes the transition of infected cells into deep latency, supporting the emerging block-and-lock strategy for functional HIV cure. In addition, a three-dimensional eradication boundary surface and therapy cube were constructed to identify regions of parameter space where Re < 1, corresponding to successful viral control. These visualizations show that viral eradication is unlikely when therapies act independently but becomes achievable when multiple therapeutic mechanisms act simultaneously. Overall, the results highlight the critical role of transcriptional inhibition through Tat-targeting therapies in complementing existing ART regimens. By simultaneously suppressing viral replication and promoting deep latency, Tat-based combination strategies may significantly reduce viral rebound and contribute to long-term functional control of HIV infection.
Wang, D.; Froehlich, F.; Stapor, P.; Schaelte, Y.; Huth, M.; Eils, R.; Kallenberger, S.; Hasenauer, J.
Show abstract
Experimental methods for characterizing single cells and cell populations have improved tremendously over the past decades. This progress has enabled the development of quantitative, mechanistic models for cellular processes based on either single cell or bulk data. However, coherent statistical frameworks for the model-based integration of different data types at the single-cell and population levels are still missing. In this work, we present a mathematical modeling approach for integrating single-cell time-lapse, single-cell snapshot, single-cell time-to-event and population-average data. Utilizing a formulation based on nonlinear mixed-effect modeling, we enable the description of multiple data types, with and without single-cell resolution, and we propose a tailored parameter estimation method. Furthermore, we propose a tailored parameter estimation scheme that facilitates the assessment of underlying process parameters. Our study demonstrates that the proposed approach can reliably integrate diverse data types, thereby improving parameter identifiability and prediction accuracy. Applying this framework of extrinsic apoptosis reveals that simultaneously considering multiple data types can be essential, particularly when experimental constraints limit data availability. The proposed approach is broadly applicable and may significantly advance our understanding of complex biological processes.
Su, H.; Liang, Y.; Xiao, W.; Li, H.; Liu, X.; Yang, Z.; Yuan, M.; Liu, X.
Show abstract
The escalating crisis of antimicrobial resistance necessitates novel therapeutic strategies, among which drug combination therapy shows great promise by enhancing efficacy and reducing toxicity. However, identifying effective synergistic pairs from the vast combinatorial space remains experimentally challenging and resource-intensive. To address this, we introduce GCN-Mamba, a deep learning framework that integrates Graph Convolutional Networks (GCN) with the Mamba State Space Model. This architecture captures both local molecular topological structures and global implicit interactions by leveraging Extended 3-Dimensional Fingerprints (E3FP) and bacterial gene expression profiles. Evaluation on a comprehensive dataset demonstrated that GCN-Mamba significantly outperforms classical machine learning models in predictive accuracy. In a targeted case study against Methicillin-resistant Staphylococcus aureus (MRSA), the model successfully rediscovered known synergistic pairs, such as Quercetin and Curcumin, consistent with recent literature. Furthermore, prospective in vitro validation confirmed a novel synergistic combination of Shikimic acid and Oxacillin, validating the models practical utility. By efficiently prioritizing potential candidates, GCN-Mamba serves as a powerful and reliable tool for accelerating the discovery of synergistic antimicrobial combinations, effectively bridging the gap between computational prediction and experimental validation.
Teshirogi, Y.; Terada, T.
Show abstract
Molecular dynamics (MD) simulations are a powerful tool for investigating biomolecular dynamics underlying biological functions. However, the accessible spatiotemporal scales of conventional all-atom simulations remain limited by high computational costs. Coarse-graining reduces these costs by decreasing the number of interaction sites and enabling longer timesteps. In extreme cases, proteins are represented as single spherical particles; while such approximations facilitate cellular-scale simulations, they often sacrifice essential structural information, such as molecular shape and interaction anisotropy. Here, we present CGRig, a rigid-body protein model with residue-level interaction sites designed for long-time, large-scale simulations. In CGRig, each protein is treated as a single rigid-body embedding residue-level interaction sites. Its translational and rotational motions are described by the overdamped Langevin equation incorporating a shape-dependent friction matrix. Intermolecular interactions are calculated using G[o]-like native contact potentials, Debye-Huckel electrostatics, and volume exclusion. We validated that CGRig accurately reproduces the translational and rotational diffusion coefficients expected from the friction matrix for an isolated protein. For dimeric systems, the model successfully maintained native complex structures. Furthermore, two initially separated proteins converged into the correct complex with an association rate consistent with all-atom simulations. Notably, CGRig achieved a simulation performance exceeding 17 s/day for a 1,024-molecule system. These results demonstrate that CGRig provides an efficient framework for simulating protein assembly while retaining residue-level interaction specificity, making it a valuable tool for investigating large-scale biomolecular self-assembly.
Wang, Q.; Shi, x.
Show abstract
Accurate prediction of drug synergy is paramount for developing effective combination therapies and advancing personalized medicine. Although methods based on graph neural networks (GNNs) have become a prevalent approach, they often treat molecules as flat graphs of connected atoms, thus overlooking their inherent hierarchical structure (i.e., atoms forming functional groups) and the critical topological information that governs molecular interactions. To address this limitation, we introduce TopoFuseNet, a novel hierarchical graph representation learning framework that integrates multi-scale topological features. The core innovations of TopoFuseNet include: 1) The first-ever application of "Group Centrality" from network science to cheminformatics, enabling the identification and quantification of functional groups crucial to drug activity; 2) A systematic, multi- path strategy to seamlessly integrate node-level (atom) and group-level (functional group) topological features into a Graph Attention Network (GAT) via feature augmentation, attention biasing, and hierarchical pooling; 3) A Differential Transformer module to deeply fuse multi-modal features learned from sequences, fingerprints, and our proposed hierarchical graph representations. Extensive experiments on two large-scale benchmark datasets, DrugComb and DrugCombDB, demonstrate that TopoFuseNet significantly outperforms state-of-the-art methods across multiple key metrics, including AUC, AUPRC, and F1-score, while exhibiting exceptional generalization robustness under various stringent cold-start scenarios. In-depth ablation studies further confirm the effectiveness and necessity of each proposed innovative module. Furthermore, multi-scale interpretability analysis and zero-shot cross-domain transfer experiments reveal that the model successfully captures molecular interaction rules with clear pharmacological significance, demonstrating immense practical potential for discovering novel combination therapies through large-scale virtual screening. Our work not only delivers a superior model for drug synergy prediction, but more importantly, it establishes a novel and scalable paradigm for effectively integrating hierarchical molecular structures and topological information into GNNs.
Chattaraj, A.; Kanovich, D. S.; Ranganathan, S.; Shakhnovich, E. I.
Show abstract
Phase separated condensates are recognized as a ubiquitous mechanism of spatial organization in cell biology. Biophysical modeling of condensates provides critical insights into the dynamics and functions of these subcellular structures that are difficult to extract via experiments. Here we present an efficient computational pipeline, CASPULE (Condensate Analysis of Sticker Spacer Polymers Using the LAMMPS Engine), to simulate and analyze the biological condensates made of sticker-spacer polymers. CASPULE implements a unique force field that combines traditional Langevin dynamics with a "detailed balance proof" protocol for single-valent bond formation between stickers. This framework allows us to study the non-trivial biophysics that emerge out of the single-valent sticker interactions coupled with the effect of separation in energetic contribution by stickers and spacers. We provide detailed documentation on how to setup the simulation environment, perform simulations and analyze the results. Through case studies, we highlight the utility and efficacy of our pipeline. Importantly, we provide statistical parameters to characterize the cluster size distribution often observed in biological systems. We envision this tool to be broadly useful in decoding the interplay of kinetics and thermodynamics underlying the formation and function of biological condensates.
Ben-Joseph, J.
Show abstract
Lightweight epidemic calculators are widely used for teaching and rapid scenario exploration, yet many omit the methodological detail needed for scientific reuse. We present a browser-native SIR calculator that exposes forward Euler and classical fourth-order Runge-Kutta (RK4) integration alongside epidemiologically interpretable outputs and a population-conservation diagnostic. The implementation is anchored to analytical properties of the deterministic SIR system, including the epidemic threshold, the peak condition, and the final-size relation. Bench-mark experiments show that RK4 is essentially step-size invariant over practical discretizations, whereas Euler at a coarse one-day step overestimates peak prevalence by 3.97% and final size by 0.66% relative to a fine-step RK4 reference. These results demonstrate that browser-based tools can support publication-quality computational narratives when solver choice, diagnostics, and assumptions are treated as first-class outputs.
Zhang, H.; Zheng, G.; Xu, Z.; Zhao, H.; Cai, S.; Huang, Y.; Zhou, Z.; Wei, Y.
Show abstract
Missense variants are a common type of genetic mutation that can alter the structure and function of proteins, thereby affecting the normal physiological processes of organisms. Accurately distinguishing damaging missense variants from benign ones is of great significance for clinical genetic diagnosis, treatment strategy development, and protein engineering. Here, we propose the VarDCL method, which ingeniously integrates multimodal protein language model embeddings and self-distilled contrastive learning to identify subtle sequence and structural differences before and after protein mutations, thereby accurately predicting pathogenic missense variants. First, leveraging sequence and structural information before and after mutations, VarDCL generates sequence-structural multimodal features via different language models. It incorporates both global and local perspectives of feature embeddings to provide the model with dynamic, multimodal, and multi-view input data. Additionally, a Self-distilled Contrastive Learning (SDCL) module was proposed to enable more effective information integration and feature learning, enhancing the models ability to detect sequence and structural changes induced by mutations. Within this module, the multi-level contrastive learning framework excels at capturing information differences before and after mutations within the same modality; meanwhile, the feature self-distillation mechanism effectively utilizes high-level fused features to guide the learning of low-level differential features, facilitating information interaction across different modalities. The VarDCL framework not only ensures the models capacity to learn dynamic changes pre- and post-mutation but also significantly improves cross-modal information interaction between sequence and structure, thereby remarkably boosting the models performance in distinguishing pathogenic mutations from benign ones. To validate the effectiveness of VarDCL, extensive experiments were conducted. The ablation study demonstrates that all key components of VarDCL contribute significantly. On an independent test set containing 18,731 clinical variants, VarDCL achieved an AUC of 0.917, an AUPR of 0.876, an MCC of 0.690, and an F1-score of 0.789, outperforming 21 state-of-the-art existing methods. Benchmark analysis shows that VarDCL can be utilized as an accurate and potent tool for predicting missense variant effects.
Yi, J.; Liu, J.; Guo, P.; Ye, Y.-n.; zhou, X.
Show abstract
Rapid advances in single-cell RNA sequencing (scRNA-seq) technology have enabled the investigation of gene expression changes at the single-cell level, particularly for elucidating the heterogeneity among cells and complex biological processes. This technique reveals subtle molecular differences within individual cells, thereby offering a unique viewpoint for the investigation of cell cycle progression, cellular differentiation, and disease pathogenesis. However, accurately identifying and analyzing cell cycle dynamics in scRNA-seq data remains challenging due to the complexity of the data and the subtle differences between cell states. To address this challenge, we developed the integrated Sinusoidal and Piecewise AutoEncoder (SPAE), an autoencoder-based piecewise linear model, for characterizing the cell cycle dynamics and cell states in scRNA-seq data. Compared with existing methods, SPAE demonstrates substantially improved accuracy and robustness in cell cycle characterization. Additionally, SPAE can accurately predict cancer cell cycle transitions and effectively facilitate the removal of cell cycle effects from gene expression data. SPAE is available for non-commercial use at https://github.com/YaJahn/SPAE.
Okochi, Y.; Sawazaki, Y.; Kondo, Y.; Naoki, H.
Show abstract
Cell division is fundamental to multicellular organisms and stochastic partitioning of cellular components can strongly affect genome-wide gene expression states. However, how cell division-associated partitioning noise shapes the dynamics of proliferating cells is poorly understood. Here, we propose scDIVIDE, a neural stochastic differential equation framework to infer continuous cellular dynamics and division rates while accounting for partitioning noise. We combined birth-death-mutation processes from population genetics with dynamical optimal transport and revealed that the birth rate is embedded in the diffusion coefficient, enabling its inference from time-series scRNA-seq data. scDIVIDE accurately inferred birth rates in synthetic data and the inferred birth rates recapitulated turnover-related programs in mouse hematopoiesis data. By exploiting the birth-diffusion coupling, scDIVIDE provides a biologically-informed constraint on growth rate estimation, outperforming existing methods in predicting future cell distributions. scDIVIDE provides a conceptual avenue for quantitatively dissecting how partitioning noise shapes fate decisions in multicellular systems.
Li, J.; Zhao, Z.; Rui, J.; Zhao, J.; Luo, Q.; Li, K.; Song, W.; Perez, S.; Frutos, R.; Su, Y.; Chen, Q.; Xiang, T.; Chen, T.
Show abstract
Against the backdrop of global climate change and accelerating population mobility in 2025, chikungunya fever (CHIKF) exhibited a trend of worldwide spread, significantly increasing the difficulty of controlling tropical mosquito-borne diseases. To enhance the precision of intervention strategies, this study developed an age- and sex-structured human-mosquito interaction dynamic model based on data from the largest CHIKF outbreak ever recorded in China, and conducted a targeted analysis of prevention and control strategies. By decomposing the basic reproduction number and examining population heterogeneity, asymptomatic males aged 15-59 years were identified as the core transmission group. Optimal control analysis revealed that the synergistic implementation of three measures-- reducing the effective human-to-mosquito transmission rate, reducing the effective mosquito-to-human transmission rate, and suppressing mosquito population density--could reduce the overall infection rate by 95.7586%. Among these, mosquito population suppression should be prioritized as a universal core strategy; however, its protective effect on females aged 60 years and above was relatively weak, warranting particular attention. The study further demonstrated that asymmetric intensity combinations targeting these three intervention pathways--such as intensity profiles of "10%, 90%, 90%" or "60%, 80%, 90%"--could achieve effective outbreak control. This research elucidates population-specific transmission patterns and key pathways for intervention intensity, providing a theoretical and strategic foundation for the precise control of mosquito-borne diseases. It also provides actionable operational insights to support rapid response and strategy optimization for future emerging outbreaks. Author summaryCHIKF is a mosquito-borne viral disease that is gradually spreading from tropical regions to other areas. To achieve more precise control of this disease, we developed an age- and sex-structured analytical model based on the largest CHIKF outbreak in China, aiming to provide a scientific basis for responding to potential future outbreaks with inherent uncertainties. The study found that asymptomatic males aged 15-59 years were the primary drivers of transmission and should be prioritized as a key population for reducing viral spread in prevention efforts. When evaluating the effectiveness of different intervention strategies, females aged 60 years and above were the least affected by the implemented measures, indicating that this group should strengthen personal protection to lower their infection risk. Among all control measures, mosquito suppression was the most effective, suggesting that vector control strategies should be prioritized in future outbreak responses.
Yamauchi, M.; Murata, Y.; Niina, T.; Takada, S.
Show abstract
There is a growing demand for molecular dynamics simulations to explore longer timescale behavior of giant protein-DNA complexes such as chromatin. To address this need, we extended OpenCafeMol, a GPU-accelerated residue-level coarse-grained molecular dynamics simulator originally developed for proteins and lipids, to support 3SPN.2 and 3SPN.2C DNA models. We also implemented a hydrogen-bond-type many-body potential to model DNA-protein interactions more accurately. To further improve computational efficiency, we introduced a localized scheme for calculating base-pairing and cross-stacking interactions. Benchmark tests show that OpenCafeMol on a single GPU achieves up to 200-fold speed-up for DNA-only systems and up to 100-fold speed-up for DNA-protein complexes compared to CPU-based simulations. To demonstrate the capability of our implementation for long-timescale biological processes, we simulated an archaeal SMC-ScpA complex undergoing DNA translocation via segment capture (a proposed mechanism for DNA loop extrusion) in the presence of a DNA-bound obstacle. We observed continuous captured-loop growth accompanied by obstacle bypass within the segment capture framework.
Subramanian, N.; Kumar, S. P.; Rengaswamy, R.; Bhatt, N. P.; Narayanan, M.
Show abstract
Predicting cellular behaviors, a central task in systems biology and metabolic engineering, can be enhanced through integrative modeling of processes such as gene regulation and metabolism. Information flow from gene regulation (modeled via a gene regulatory network) to metabolism (modeled via a genome-scale metabolic model) is well-studied, but the reciprocal regulation of genes by metabolites is less explored. We introduce CausalFlux, a method that models bidirectional feedback between genes and metabolites, in order to predict steady-state reaction fluxes under wild-type (WT) or perturbed (e.g., gene knockout/KO) conditions. CausalFlux does so by iteratively performing causal surgery on a Bayesian gene regulatory network and constraint-based analysis of a coupled metabolic model. CausalFlux enabled us to assess the impact of two-way feedback in several testbed models and real-world biological systems by comparing its predictions to those of TRIMER, a state-of-the-art model of gene-to-metabolite one-way feedback. Incorporating bidirectional feedback, as in CausalFlux, improved the Spearman correlation between actual and predicted fluxes in 92% of the 39 distinct simulation conditions relative to TRIMER. For predicting growth/no-growth phenotype following single-gene KOs in E. coli, CausalFlux achieved a balanced accuracy of 0.79 in identifying essential genes, and TRIMER achieved 0.71 for the same task, again highlighting the importance of modeling two-way feedback. In ablation studies that further dissect the role of specific metabolite[->]gene feedback edges in E. coli, the F1 scores of gene essentiality predictions decreased by 7.5% and 13% upon ablation of feedback edges from any metabolite to the crp gene and the 10 metabolic feedback genes with the highest influence on the KO genes, respectively. Finally, we highlight the application of CausalFlux to predict the essentiality of several hundred genes under different media conditions. Overall, our findings show that CausalFlux can crucially utilize information on feedback metabolites to predict trends in reaction fluxes and qualitative (growth/no-growth) outcomes; thereby encouraging future systems modeling efforts to carefully incorporate not only gene-to-metabolite but also metabolite-to-gene interactions. AvailabilityCode pertaining to the CausalFlux method, and its benchmarking and application is publicly available at: https://github.com/BIRDSgroup/CausalFlux. Author summaryThe myriad processes within a living cell, such as gene regulation or metabolism, are tightly interconnected. Modeling these interconnected processes can offer a deeper mechanistic understanding of cellular behaviors, as well as guide efforts that engineer the metabolic output of a cell. In this work, we develop a novel integrated model of gene regulation and metabolism that incorporates bidirectional feedback between these two processes, via the concept of metabolite-induced causal surgery on a gene regulatory network and gene-induced constraints on the fluxes of metabolic reactions. Our model, which we call CausalFlux, represents an advance over most existing models that capture just the one-way gene-to-metabolism feedback (i.e., genes coding for enzymes that control metabolic reactions). Our CausalFlux methodology opens up an unique opportunity to quantify the impact of two-way feedback in gene-metabolite systems, via comparison of CausalFluxs predictions to those of TRIMER, a published model incorporating one-way feedback alone. For predicting reaction fluxes in testbed models and essential genes in E. coli, quantitative comparison of the performance of CausalFlux vs. TRIMER showed that accounting for two-way feedback leads to more accurate and biologically meaningful predictions. CausalFlux also enabled us to quantify the effect of two-way feedback by comparing prediction performance before and after ablation of certain feedback edges from metabolites to genes. Overall, our findings highlight the importance of modeling gene regulation and metabolism as two-way interconnected systems within a living cell, and encourage future works to incorporate gene{leftrightarrow}metabolite feedback into their analyses.
Ren, Y.; Morlot, L.; Andrews, J. O.; Thrane Hertz, E. P.; Mailand, N.; Caicedo, J. C.
Show abstract
Recent advances in cell segmentation successfully produce models that generalize across various cell-lines and imaging types. However, these methods still fail to recognize subcellular structures such as micronuclei (MN), which are rare and tiny DNA-containing structures found outside of the main nucleus and observable under the microscope. While they can be hard to recognize in images, studying MN formation is of great interest because of their relationship to chromosome instability, genotoxicity, and cancer progression. Here we present a segmentation model, mnDINO, to segment micronuclei in DNA stained images under diverse experimental conditions with very high efficiency and accuracy. To train this model, we collected a heterogeneous set of images with more than five thousand annotated micronuclei. Trained with this diverse resource, the mnDINO model improves the accuracy of MN segmentation, and exhibits strong generalization across microscopes and cell lines. The dataset, code, and pre-trained model are made publicly available to facilitate future research in MN biology.
Klaus, C.; Sotomayor, M.
Show abstract
Deep learning approaches have revolutionized protein structure prediction. These tools are trained using experimental data and recapitulate reported conformations, but there is great interest in predicting conformations that may be functionally relevant although experimentally underrepresented. Since many modern structure prediction tools use generative artificial intelligence diffusion models, we reframe the search for alternative molecular conformations as that of sampling from a diffusion distribution conditioned using any arbitrary Bayesian likelihood. We implement a twisted diffusion sampler in Boltz-2 to sample this conditioned distribution and demonstrate the utility of this approach, which does not require any additional training of the neural network, by implementing a diffusion analog of steered molecular dynamics simulations applied to mechanical systems. We can reproduce predicted stretched states of fragments of DNA, the muscle protein titin, and the inner-ear protocadherin-15 protein, as well as open states of the MscL ion channel consistent with experimental results. We expect that steered structure predictions will help sample underrepresented and non-equilibrium conformations for many macromolecular systems.
Hernandez Vargas, E. A.
Show abstract
Evolutionary therapies regulate heterogeneous populations by altering selective pressures through treatment sequences in cancer and infections. This letter develops an invariant-set framework for treatment-induced containment based on positive triangular invariant sets. For periodically switched systems, sufficient conditions are derived for the existence of such invariant regions. Robustness with respect to mutation is established by showing that the invariant simplex persists under small perturbations of the subsystem matrices. In the two-phenotype case, the analysis yields an explicit mutation threshold that separates regimes in which therapy cycling maintains containment from regimes in which mutation can enable evolutionary escape. Simulations illustrate the geometry of the invariant sets and the role of mutation and dwell time in containment robustness.
Wang, Y.; WANG, D.; Lau, Y. C.; Du, Z.; Cowling, B. J.; Zhao, Y.; Ali, S. T.
Show abstract
Mainland China experienced multiple waves of COVID-19 pandemic during 2020-2022, driven by emerging variants and changes in public health and social measures (PHSMs). We developed a hypergraph-based Susceptible-Vaccinated-Exposed-Infectious-Recovered-Susceptible (SVEIRS) model to reconstruct epidemic dynamics across 31 provinces, capturing transmission heterogeneity associated with clustered contacts. We assessed key characteristics of transmission at national and provincial levels during four outbreak periods: initial, localized pre-delta, Delta, and widespread Omicron, which accounted for 96.7% of all infections. We found significant diversity in transmission contributions across cluster sizes, with a small fraction of larger clusters responsible for a disproportionate share of infections. Counterfactual analyses showed that reducing cluster-size heterogeneity, while holding overall exposure constant, could have lowered national infections by 11.70-30.79%, with the largest effects during Omicron period. Ascertainment rates increased over time but remained spatially heterogeneous with a range: (14.40, 71.93)%. Population susceptibility declined following mass vaccination (to 42.49% in Aug 2021, nationally) and rebounded (to 89.89% in Nov 2022) due to waning immunity with variations across the provinces. Effective reproduction numbers displayed marked temporal and spatial variability, with higher estimates during Omicron. Overall, these results highlight critical role of group contact heterogeneity in shaping epidemic dynamics.